Near-optimal Distributions for Data Matrix Sampling

نویسندگان

  • Dimitris Achlioptas
  • Zohar Karnin
چکیده

We give near-optimal distributions for the sparsification of large m n matrices, where m ! n, for example representing n observations over m attributes. Our algorithms can be applied when the non-zero entries are only available as a stream, i.e., in arbitrary order, and result in matrices which are not only sparse, but whose values are also highly compressible. In particular, algebraic operations with the resulting matrices can be implemented as (ultra-efficient) operations over indices.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Near-Optimal Entrywise Sampling for Data Matrices

We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2. For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they...

متن کامل

EIGENVECTORS OF COVARIANCE MATRIX FOR OPTIMAL DESIGN OF STEEL FRAMES

In this paper, the discrete method of eigenvectors of covariance matrix has been used to weight minimization of steel frame structures. Eigenvectors of Covariance Matrix (ECM) algorithm is a robust and iterative method for solving optimization problems and is inspired by the CMA-ES method. Both of these methods use covariance matrix in the optimization process, but the covariance matrix calcula...

متن کامل

Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning

Density-ratio estimation (i.e. estimating f = fQ/fP for two unknown distributions Q and P ) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rat...

متن کامل

Comparison of Optimal Design Methods in Inverse Problems.

Typical optimal design methods for inverse or parameter estimation problems are designed to choose optimal sampling distributions through minimization of a specific cost function related to the resulting error in parameter estimates. It is hoped that the inverse problem will produce parameter estimates with increased accuracy using data collected according to the optimal sampling distribution. ...

متن کامل

A Simple Approach to Optimal CUR Decomposition

Prior optimal CUR decomposition and near optimal column reconstruction methods have been established by combining BSS sampling and adaptive sampling. In this paper, we propose a new approach to the optimal CUR decomposition and near optimal column reconstruction by just using leverage score sampling. In our approach, both the BSS sampling and adaptive sampling are not needed. Moreover, our appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013